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Abstract 



The Kalman filter combines forecasts and new observations to obtain an 
estimation which is optimal in the sense of a minimum average quadratic 
error. The Kalman filter has two main restrictions: (i) the dynamical 
system is assumed linear and (ii) forecasting errors and observational 
noises are projected onto Gaussian distributions. Here, we offer an im- 
portant generalization to the case where errors and noises have heavy 
tail distributions such as power laws and Levy laws. The main tool 
needed to solve this "Kalman-Levy" filter is the "tail-covariancc" ma- 
trix which generalizes the covariance matrix in the case where it is 
mathematically ill-defined (i.e. for power law tail exponents /x < 2). 
We present the general solution and discuss its properties on pedagog- 
ical examples. The standard Kalman-Gaussian filter is recovered for 
the case ix = 2. The optimal Kalman-Levy filter is found to deviate 
substantially from the standard Kalman-Gaussian filter as ^ deviates 
from 2. As /X decreases, novel observations are assimilated with less and 
less weight as a small exponent jx implies large errors with significant 
probabilities. In terms of implementation, the price-to-pay associated 
with the presence of heavy tail noise distributions is that the standard 
linear formalism valid for the Gaussian case is transformed into a non- 
linear matrix equation for the Kalman-Levy filter. Direct numerical 
experiments in the univariate case confirms our theoretical predictions. 
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A 



I. INTRODUCTION AND MOTIVATION 



The Kalman filter provides the optimal resolution of the problem of data assim- 
ilation under the hypothesis that the system observables evolve according to linear 
maps, are linearly related to the true variables and that the noise acting on the true 
dynamics and the measurement errors are mutually uncorrelated and Gaussian. 

Two main limitations restrict the performance of the Kalman filter: 

• the nonlinearity of the real system dynamics; 

• the non-normality of the noises. 

The first item has been addressed partly by using so-called "extended" Kalman filters 
that amount essentially to perform local linearizations [jT^|. 



With respect to the non-normality of the noises, the general condition for using 
the Kalman filter is that their covariance functions exist, which is satisfied for noise 
density distributions decaying faster than When the noise distributions are 

not Gaussian, the validity of the Kalman filter relies on the existence of a central 
limit theorem for state estimators, which exists when the random terms in the model 
have arbitrary distribution with tail decaying faster than p6|]. For practical 



applications, the existence of the central limit theorem does not suffice as a finite 
observation time may lead to large deviations from the asymptotic results [^. The 
knowledge of convergence rates in the central limit theorem are then necessary for 
the development of tests of the validity of the model [Q. 

Our purpose here is to extend these results to the regime where the existence of 
the covariance function is not warranted as occurs for Levy distributions of noises, or 
when the covariance functions do exist but the convergence to the asymptotic result 
given by the central limit theorem is extremely slow making the asymptotic result 



useless in practice, as occurs for power law distributions ||25[. In order to illustrate 



this idea, consider the sum of identically independent distributed random variables, 
with a power law probability density function with exponent fi > 2. Since the 

variance o"^ is finite, the central limit theorem applies and the distribution of the sum 
converges to the Gaussian law with standard deviation ^ a^/N. For finite, the 
Gaussian law only describes the central part of the distribution of the sum, up to a 
cross-over 5*0 ~ aVN \nN |^5[|. Beyond 5*0, the distribution of the sum is a power law 
with the same exponent fi and the weight in probability of this power law tail decays 
slowly as oc as increases. In practice, consider the three sample 

sizes = 10^, 10^ and 10^. The corresponding cross-over values are respectively 
^0 ~ 2.1, 3 and 3.7 times the standard deviation of the central Gaussian part of the 
distribution of the sum. These estimates suggest that, even if the general condition 
for using the Kalman filter is satisfied for noise density distributions decaying faster 
than the Gaussian (or covariance) approach, which is optimal in the linear least- 
variance estimation, is not necessarily optimal when relatively large fluctuations (of 
size equal to two standard deviations or larger) occur. 

We thus propose to explore how the optimal Kalman filter is modified when the 
objective is to minimize, not the variance of the error estimation but, a natural mea- 
sure of the large errors, namely the tail of the distribution of the Euclidean norm of 
the errors. Our approach thus develops an alternative class of linear unbiased esti- 
mators different from the standard linear least-variance estimation. Our emphasis is 
in trying to control and minimize the large (and rare) errors, with the penalty that 
the variance, if it exists, will be sub-optimal compared to the standard linear least- 
variance estimation. In this goal, we focus our attention on noises in the observable 
and the unobservable variables both distributed with a power law tail of fixed expo- 
nent fi and known amplitudes (scale factors). This assumption offers a well-defined 
theoretical limit of "large fluctuations" , putting the emphasis on the complement to 
the "small noise" limit captured by the standard linear least-variance approach. 



In addition to its pure theoretical interest, we observe that many systems in Nature 



are claimed to exhibit power law distributions and the present results may thus 
have direct application. Power law distributions have been found to quantify the 
size- frequency Gutenberg- Richter distribution of earthquakes, of hurricanes of 



volcanic eruptions, of floods, of meteorite sizes and so on. The distribution of seismic 
fault lengths is also documented to be a power law with exponent ~ 1. In the 
insurance business, recent studies has shown that the distribution of losses due to 
business interruption resulting from accidents P7|,|25[| is also a power law with /i ~ 1. 



Several previous works have addressed related issues. Le Breton and Musiela's 



work [IT3 is closest to ours but has limitations: (i) it is based on a continuous time 
description of the dynamical and observation processes and is thus more difficult to 
apply to concrete situations ; (ii) it minimizes the difference between the true dy- 
namics and a filtered observation in the L'^-norm sense where fi is the index exponent 
of the Levy distributions; in other words it relies on an explicit solution of the dy- 
namics; in this way, Le Breton and Musiela circumvent the two delicate questions of 
generalizing the covariance matrix and of choosing the objective function to optimize; 
(iii) it does not really address the genuine Kalman problem which consists in mixing 
forecasts and observations. Recently, Ahn and Feldman have proposed a filter for 
the case where the signal is Gaussian while the observation noise is a Levy process. 
Their filter is optimal in the sense of minimizing the error, i.e. the distance be- 
tween the true dynamics and the filter output. This choice of the is realizable in 
their case because the signal is assumed Gaussian and the integrability assumption 
is thus satisfied. The problem however is that the optimal filter will depend in gen- 
eral upon the choice of the norm that measures the prediction error. In addition, 
due to the rather intractable nonlinear recursive solution they obtain, they propose 
a sub-optimal filter for numerical purpose. 

Our approach circumvents these difficulties by focusing on large errors. Our goal 



is thus to minimize the large errors between the analysis (obtained by assimilation of 
observation with prediction) and the true trajectory. With this choice, the technical 
problem we have to solve is to characterize the tails of error distributions, which are 
a function of the tail of the distributions of individual dynamical variables and of 
noises and of their mutual dependence. In order to determine these dependences, 
we propose to use the concept of a "tail-covariance" matrix that generalizes the 
standard covariance matrix of the analyzed signal to the case of power laws and Levy 
distributions. Tail-covariance matrices are constructed as the matrices of signed scale 
factors (the scale factors being defined as the global amplitudes of the power law 
tails) of the distribution of all the products of the system and forecasted variables. 
The main idea is thus to replace the characterization of errors and of correlations by 
the ensemble of distributions of the products of all possible pairs of variables. Our 
approach thus replaces a reasoning based on the second centered moments into one 
based on the tails of the distribution of deviations from true values. It is shown that 
the natural error amplitude to be minimized is the trace of the tail-covariance, which 
is a straightforward extension of the standard approach which determines the Kalman 
filter by minimizing the average errors quantified by the trace of the covariance matrix 
of the analyzed signal. 

The paper is constructed in a pedagogical manner, from the simple univariate filter 
problem to the full multivariate Kalman filter. In section 2, we formulate the problem 
for the univariate filter with power law noises, corresponding to the situation where 
only novel observations are assimilated without mixing with a dynamical forecast. 
A detailed discussion is offered to compare the power law case with the standard 
Gaussian case. In section 3, we generalize this problem to the case of multivariate 
estimators. In section 4, we address the Kalman problem of mixing both forecasts and 
observations with power law and Levy law errors and develop our general solution. 
The special univariate case is studied in great details by numerical experiments to 



contrast the performances of the Kalman-Levy with those of the standard Kalman- 
Gaussian filter. 

II. UNIVARIATE ESTIMATOR: PRINCIPLE OF ESTIMATION WITH 

POWER LAWS 

A. Formulation of the Problem 

Take two samples and x° of an observable state variable x; the superscripts {-Y'" 
correspond to forecast and observation of the data assimilation system, respectively. 
The two samples are contaminated by independent noise and uj°. The estimate x 
of X is sought as a linear combination of x° and x^ with the corresponding positive 
weights and K° 

X = K^x^ + K°x" . (1) 

One of our main goals is to determine the optimal weights so as to minimize the 
resulting uncertainty of x. We make two assumptions concerning samples in this 
study. 

The first assumption that the two samples are unbiased 

{x') = = (x) , (2) 

where {z) is the expectation of z. Expectation of the state variable (x) is not known 
a priori. By requiring further that the estimate should be unbiased (x) = (x), we 
obtain a relation between the two weights + = 1 which allows us to rewrite 
(S) as 

X = x^ + K°{x° - xO , (3) 

and therefore reduces the problem to the determination of a single unknown < 
K° < 1. This expression @ can be interpreted as a filtering of the observed data x° 
into the dynamical forecast x^ by a weighted increment K°{x° — x^). 



The second assumption is that both sample errors, 

= x^'° -X , (4) 

are distributed according to a power law as defined in or according to a Levy law 
as defined in Appendix ^ with the same exponent fJ'° = /i. The important property 
for our purpose is that the tails of the probability density functions of the two sample 
errors are independent and given by 

P{J'") ~ ±00 , (5) 

where the subscript {■}± refiects that the tail distribution can be asymmetric de- 
pending on the sign of In this study, we focus on the symmetric case, i.e.. 

The family of power laws is characterized by two parameters, the exponent /i and 
the 'scale factor' C. The exponent /i, on one hand, controls the decay rate of the 
probability as well as its scaling (or self-similar) properties. The scale factor C, on 
the other hand, controls the overall amplitude of the power law tail, i.e., the larger 
it is, the more important is the power law tail. More precisely, if the power law tail 
d^) holds for UJ larger than some minimum value ci^min, the weight in probability of 
the power law, i.e. the probability that u is larger than ujI^^^ is {C^'° / fi){ujl^^^)^^. As 
shown in Appendix ^ in the case of Levy laws, the scale parameter fully characterizes 
the distribution for all variations (and not only in the tail). 

Notice that (§) can be rewritten in terms of the dimensionless variable uj/Ct^ with 
the superscripts dropped for simplicity 

P{u)duj- \ d{uj/C~A \uj\ — ^±00, (6) 

\uJ/CT^\^+^' ^ ' 

showing that C'' is the characteristic scale of the self-similar fiuctuations of u. For 
fi < 2 (resp. fi < 1), the variance (resp. mean) is not defined mathematically. We 
shall see the effect of these /^-dependent properties throughout this study. 



Using the continuation property given in p| (Problem 15 of Section 12) on the 
characteristic function of distributions with power tails, we obtain the following re- 
sults. Let us call a /i— variable a variable with a distribution function with a power 
law tail. Then, 

(i) if Wi and Wj are two independent /x-variables characterized by the scale factors 
Cf and Cf , then Wi + Wj is also a /z- variable with given by Cf + Cf . 

(ii) If w is a /i- variable with scale factor C, then p x w (where p is a real number) 
is a /i-variable with scale factor p^C. If p < and the distribution of w is 
symmetric, then p x w is a /i- variable with scale factor |p|''C. 

(iii) If w is a /i- variable, then sign{w)\w\'^ , with g > 0, is a ^-variable. 

(iv) If Wi and Wj are two independent /i-variables, then the product x = WiWj is 
also a yU-variable up to logarithmic corrections. 

Using the rules (i) and (ii), we find that the distribution of x is also a power law 
d^) with the same exponent n, but with an adjusted scale factor C given by 

C={1- K°YC^ + {K°YC° . (7) 

The expression (^, which is valid for < K" < 1 such that the scale factors 
remain positive, can be reduced to the usual result for the Gaussian distributions 

= {l-K°y{a'y + {K°)\a°)\ (8) 

by setting the exponent fi = 2 and replacing the scale factors C^'° with (cr^'°)^. We 
thus see that the scale parameter C is the generalization of the variance a^. The 
technical reason for this comes /^from the form of the characteristic function of a 
distribution with a power law tail, as given in (Problem 15 of Section 12). The 
situation is even simpler to discuss when considering symmetric Levy laws whose 
characteristic functions read 



L^{k) = exp {-a^\k\'') , for < < 2 , (9) 

where constant proportfonal to the scale parameter C [|TT|]. The important 

point is that, for /i strickly less than 2, the inverse Fourier transform of L^{k) gives 
a power law tail, while for yU = 2, it gives a Gaussian law. The continuity between 
the expressions (0) and (H) can thus be traced back to that of L^{k) as a function of 
II at n = 2. (see Appendix for a more formal derivation of this fact). 



B. Solution 

The standard optimal estimation methodology consists in minimizing the vari- 
ance 0"^ with respect to the weight factor K°. The solution for K° then gives the best 
weighting in the sense that we remain with the smallest uncertainty from the esti- 
mation with the novel data, by minimizing the expectation distance between x and 
X in the mean-square sense. In order to generalize this methodology to the situation 
where the errors are distributed according to power law distributions, we propose the 
following central idea, i.e., to minimize the scale factor C with respect to K° 
d 



dK° 



C = 11 -(1 - K^'Y-^C^ + (ir°)^-^C° = (10) 



(7 = /i(/i - 1) [(1 - K°Y-^C^ + {K°Y-^C° 



>0. (11) 



The justification for this procedure is that the uncertainty in the estimation x of x is 
inescapably distributed according to a power law distribution with the same exponent 
yU as a result of the rules (i)-(ii) given above. Consequently, the optimization using 
the weight K° can be performed only in one purpose, namely to decrease the global 
amplitude of the power law controlled by C but without be able to distort its shape 
defined by fi. The optimal weight K° as the solution to (p!0| ) depends on the value of 
the exponent fj, as follows. 

1. For /i > 1, the minimization of C given by (^ with respect to K° gives 



where 

A ^ (13) 

is the ratio of the characteristic error size of the two samples, as defined in the 
distribution (|^). The resulting optimal scale factor is 

C = — — & (14) 

[1 + j 

where the superscript {■}'^ stands for analysis according to the data assimilation 
convention. 

2. For /i < 1, there is no optimal solution for K° which is not on the boundaries 
of the search interval and that minimizes C, because d'^C /d{K°)^ < violates 
the second condition in (0). Physically this implies that the fluctuations are 
so wild that the estimation by the weighted average is not a good strategy, and 
that only the measurement with the smallest scale factor should be kept for the 
estimation of x 

for C° > 

K"={ , (15) 

for C° < 



and therefore 



Cf for C° > 

= <( . (16) 

C° for C° < 



This second case K° = 1, consisting in trusting observations over the forecast, 
is known as "direct substitution" 0. Here, we have shown that it constitutes 
indeed the best strategy in the specific case < 1 and C° < C^. Note that 



a similar solution applies when the two observations have different exponent 
fi: full weight K° = or 1 should be put on the observation with the largest 
exponent, as it has the smallest fluctuations. This solves the general case as 
well. 

To make the problem interesting, in this paper we consider all noise sources to 
have the same exponent /i, so that the problem is a "fight between scale-factors". 
This case is not as restricted as it would appear at first site: if the mechanisms 
leading to the power law tail are intertwinned, such as for instance with a common 
source of underlying multiplicative noise, it can be shown [l^Jl^ that the power law 



exponent of the different variables will be the same as soon as there is non-vanishing 
coupling between the variables. This case will be investigated in a forthcoming work. 

The following argument retrieves (0). When neither the variance nor the mean 
exist, and when the minimization (|10D of the scale factor becomes meaningless, the 
last natural quantity to estimate is the probability that the error remaining after 
assimilation is smaller than the error on the two measurements. Suppose that we have 
the knowledge that the errors in the second measurement are larger in probability 
than that of the first measurement, i.e. C° > C^. We then require the maximization 
of the probability Pimprovement that 

\{l-K°)J + K°uj°\<\J\ . (17) 

Let us assume that lj^ is found positive. Then, this probability is the same as the 
probability that 



2 



-— + l]iU^<iu"<J 



The probability for ( p!7D to be verified is thus 



f 

OO 

,f of/. ,f\ 



J(~^+l)L0f 



^improvement = 2 / du' P\lo') I ^ ^ P°(^°)rf^°, (19) 



where the factor 2 comes from the counting of the cases where uj^ can be found 



negative. By taking the derivative of (|l^) with respect to we obtain 



C^-Pimprovement 4 f°° j, .f , nt L .f\ DO f f ^ , -, \ , ,f 



which is always negative. Thus, the probabihty that the error is reduced is maximum 
for K° = 0, i.e. without assimilating the new observation. Intuitively, the power 
law tails with exponent /i < 1 are so "wild" that it is preferable to keep only the 
observation with the smallest scale factor. A similar derivation holds in the case where 
the errors in the second measurement are smaller in probability than that of the first 
measurement, i.e. C° < C^: the probability that the error is reduced is maximum 
for = 1, i.e. with the assimilation of the new observation and the rejection of the 
first one. Again, the observation with the smaller scale factor is preferred and the 
other is rejected in this "wild tail" regime /i < 1. 

C. Properties of the "Levy-estimator" solution 

We now examine the fundamental properties of the optimal weight K° given by 
([T^) which holds when the distributions of errors are pure Levy laws as well as when 
they only exhibit a power law tail controlling the large variations. Figure [l| shows the 
infiuence of the tail exponent fi on the optimal weight K° as a function of the error 
ratio A of the two measurements given by (pIS]) . As fi approaches 1, K° crosses over 
very sharply from one to zero when A goes through 1, recovering the regime fi < I 
given by (|T3|). For larger /i's, the transition of K° from 1 to is smoother as A varies. 

The result (|T^ for fi = 2 holds not only for the power law distribution itself with 

1 -C 1 

A = (C°)2/(C )2 but also for the Gaussian law distribution with the weight 

1 



where the superscript {-j^ corresponds to the Gaussian, and 



- ^ (22) 

is the ratio of the characteristic error size according to the Gaussian law. Such K'-^ 
minimizes the variance given by (P). This is natural since the stable Levy law with 
= 2 is nothing but the Gaussian law with the exact correspondence C^'° = (cr^'°)^ 
(see Appendix ^ there should be no confusion between the exponents /i defining the 
characteristic functions of stable laws and the exponents yU of arbitrary power laws). 
Thus, the curve for yU = 2 in Figure ffl also applies for the Gaussian law with K° = 



and A = A"^. The result (12) reflects the impact of the relative uncertainties in 
and x° that are quantified by a parameter depending on the ratio of characteristic 
error size A~ = (^C°/C^^''~\ 

The weight also represents the normalized increment {x — x^) added to the 
initial difference {x° — x^) as seen from (|^): 

- f 

= J . (23) 

x° — x^ 

Figure |l] therefore can be interpreted as showing the normalized increment depending 
on the tail exponent fi, with the extreme cases x = x° a.t K° = 1 and x = x^ at 
K° = 0. At A = 1 where x^ and x° has the same uncertainty in terms of scale factor 
= C° (|T3|), K° = 0.5 puts X at the exact center point between x^ and x° for any 
fi. For A < 1 (resp.A > 1) where x° with scale factor C° is more (resp. less) accurate 
than x^ with scale factor C^, the smaller the exponent /i is, the closer x is to the 
more accurate sample x° (resp. x^). The estimation by the weight K° for the heavier 
tail distributions with /i < 2 therefore favors the accurate sample more strongly than 
in the least-variance case. Interestingly, this situation is reversed for power law tails 
with exponents /i > 2. i.e. the weight favors the accurate sample less strongly than 
in the Gaussian case. This situation applies in particular to exponential distributions 
that are formally obtained as the limit /x — > oo. 



D. Quality of Improvements: Levy versus Gaussian estimators 



1. Case /i > 2 

Let us investigate the pros and cons of the solution (|12|) for K° as the optimal 
weight for power law tails in contrast to its Gaussian counterpart (^T]) giving 
In this goal, we propose a specific example using the Student's distribution with n 
degrees of freedom, whose density function [jl4|. 



PM = 4Mt . . (24) 



yUTT r 



2 

— ) 

is defined for — oo < u; < +oo. The Student's distribution P^{oj) has a bell-like shape 
like the Gaussian (and actually tends to the Gaussian in the limit /i — > oo). It is 
however a power law like (^) for large \uj\ with a tail exponent equal to the number 
/X of degrees of freedom defining the Student's distribution, with a scale factor 

Y (li±l\ 

C,{s) = /i^ . (25) 



The parameter s represents the typical width of the Student's distribution. The 
variance exists only for /i > 2 and is given by 

Var = a^ = . (26) 

We assume that the forecast (resp. observation) sample (resp. x°) has an error 
uj^ (resp. oj") distributed according to the student's distribution (p^ ) with typical 
width (resp s°) but with the same exponent /i. The Levy weight given by ([T^ ) 
and the standard Gaussian weight given by (|2l|) are represented by the same 
error ratio 

= = ^ = A . (27) 



It is worth recalling that K"^ which we denote here , given by (0) is obtained so 
as to minimize the scale factor Cx given by while given by (|2l|) minimizes 
the variance Var^ expressed by The impact of the difference between these two 
weights can be quantified in several ways for > 2 where the variance exists. 

One measure is the corresponding variance Var^ = (1 — K°Y[a^Y + (-ft'°)^(cr°)^ 
of the total error (1 — K°)uj^ + K°uj° given by 

Var^ = ^ /z^i^'r (28) 

(l + A~) 

Varf = ^ , (29) 

where Varf (resp. Varf ) is the variance obtained by using the solution (resp. 
K'^). Figure ^ shows Var^ and Var^ as a function of A for /i = 3: by construction, 
we verify that the variance of the total error is less with K'^ than with K°. This is 
expected since, by construction, minimizes the variance. However, the difference 
is small, less than 10%. Anyway, this measure would then suggest that the Gaussian 
filter is better. 

However, for power law distributions of errors, the variance is well-known to be a 
rather poor representation of the variability, especially in the tail. It is thus interesting 
to compare the scale factors and obtained in the two schemes since they 
quantify the total weight of the power law tails. We determine the scale factors for 
the Levy and Gaussian weights as another measure of the goodness of the filtering 
method: 



= -^^C' , (30) 



A^ 

(l + Am-1 

= JTTWOTW'^' ■ ^^^^ 

(resp. C?) is obtained by putting (resp. K'^) in expression (0). Figure ^ 
shows the scale factors and C? as a function of A for /i = 3: the weight is now 



found to be better than the usual Gaussian weight K'^ , since a smaller scale factor 
implies smaller probabilities for large fluctuations. The improvement is however not 
very large, typically of the order of or less than 10%, i.e. of the same order as the 
difference between the variances (but in reverse ranking). These relatively small 
differences between the Gaussian and Levy filtering procedures become enormous for 
the case < 2 discussed next. 

The comparison between Figures ^ and ^ shows that one cannot achieve simul- 
taneously the minimization of the variance of the error and the minimization of the 
weight of the tail of large deviations of the error: either one or the other can be 
optimized. 



2. Case /i < 2 

The situation is dramatically different when /i < 2, for which the variance is 
not mathematically defined. In this case, an empirical determination of the variance 
is very unstable and absolutely unrehable. The standard Gaussian weight K'^ is 
completely useless. In contrast, the Levy weight gives a simple and clear-cut 
recipe that allows one to optimize large fluctuations in the weighting procedure. 

Let us illustrate this result by the following numerical experiments using the 
Cauchy distribution for the errors 

Pciu) = ^ , (32) 

with typical width s. The Cauchy distribution (^) is one of the stable Levy distri- 
bution and possesses a power law tail with exponent /i = 1 and a scale factor C = s. 
Let us assume that the first (resp. second) sample (resp. x°) has an error (resp. 
uj°) distributed according to the Cauchy distribution (^) with typical width (resp 
s°). Then, the resulting distribution of the errors on x is of the same form ( ^21) with 
the same exponent fi = 1 while the scale factor is given by 



C£ = (1 - K°)s^ + K°s° . (33) 

As we have found above in (0), the weight K° that minimizes is 

I for s° > 

K"=\ , (34) 

I 1 for s° < 

since the Cauchy distribution is on the border hne /i = 1. This can be verified straight- 
forwardly as a result of the linear dependence of on K° in ( |55D for which the 
optimization always selects one of the boundaries. 

Consider the case where = 1 and s° = 2. The Levy estimator imposes to choose 
K° = 0, i.e. to reject the information provided by the second sample x°. Let us now 
compare this recipe with the result obtained by applying the standard Gaussian 
weight K'^ on data generated by using the two Cauchy laws with = 1 and s° = 2. 
Specifically, we generated two sets of 1000 random numbers and distributed 
according to the Cauchy law (^) with = 1 and s° = 2. From each of these 
1000 numbers, we can estimate numerically the variance and find (cr^)^ = 3.36 x 10^ 
and (0"°)^ = 4.07 x 10^. The Gaussian estimator (^) then recommends the value 
= [1 + io-°/a^y]^^ ^ 0.45 which is very different from K° = given by (^. We 
should stress that the estimations of the variances (o"^)^ and (cr°)^ are highly unreliable 
because they can change by orders of magnitude from one sample to another. The 
reason is, as we have said, that the variance is mathematically infinite in this case, 
and therefore any estimation of it is bound to be dominated by the few largest random 
numbers that occur by chance in the series. 

Two lessons are thus to be learned from these numerical simulations: i) estimating 
the variance for distributions with exponent /i < 2 leads to very unstable results; ii) 
the resulting recommendation of the standard Gaussian weight can be very wrong. 

III. MULTIVARIATE ESTIMATOR 



A. Definition of the model 



When the state variables are multi-dimensional, their errors may be mutually 
dependent. If the errors are distributed according to the power or Levy laws, we need 
to transform the coordinate of errors to express them as linear sums of independent 
noises in order to be able use the rules stated above in points (i)-(iv). 

We consider a problem of estimating a multi-dimensional state vector x G IR^ 
using two samples G and x° G IR^. Both forecast and observations are made 
for all state variables. As in the case for the univariate estimation, we assume that 
the estimate x of x can be expressed as a linear combination of x^ and x°. Requiring 
the unbiased condition leads to one unknown weight matrix in the estimation 

X = (I - K°)x^ + K°x° , (35) 

where I is an identity matrix and K° G IR^^^ is the weight matrix for the observation 
sample x°. Our goal is to determine the optimal K° which gives the least uncertainty 
in X. We use the notation K for the weight matrix in connection to the standard 
Kalman(- Gaussian) gain matrix of sequential estimation ||13|| . 

We assume that the sample error vectors are linear sums of independent 
variables with symmetric distributions 

ef,o _ ^f,o _ ^ _ G^'°uj^'° (36) 

where the probability distribution of each is associated with the same exponent 
/i and usually different individual scale factors The assumption that the distri- 
butions of uj^'°^s are symmetric implies that the same scale factors characterize 
the tail of large positive and negative realizations. The transformation between inde- 
pendent to mutually dependent e^'° is provided by the matrix G^'° G IR^^^. We 
shall use this decomposition scheme repeatedly in the sequel as it allows us to treat in 



a simple way the interplay between the power law distributions and the dependence 
between variables. 

In the standard linear least-variance estimation theory, one calculates the covari- 
ance matrix P = (ee-^) of the error e = x — x and minimizes the expectation of the 
distance between x and x in the mean-square sense, i.e., one minimizes the trace 
of the covariance matrix P. The covariance calculation is an essential step to guar- 
antee that all error components are suitably accounted for. We thus propose our 
key idea to generalize the covariance matrix in the regime fi < 2 where it does not 
exist by using the concept of the tail-covariance defined as the matrix of scale factors 
of the distribution of all products el'"e^j°. We prefer this approach to the so-called 



"co-variation" as it is more intuitive and also presents nicer properties, in par- 
ticular the tail-covariance matrix remains symmetric. The intuitive meaning of the 
co-variation is less transparent than for the tail-covariance, which explicitly measures 
the correlations between large events only, while the co-variation picks up contribu- 
tions from the core of the distributions. Let us mention that a simplified version of 
the tail-covariance without signs (see below) has been used in the context of portfolio 
theory |[. 

Consider the sample error vectors e^'°. We thus study the product el'°e^j°, whose 
probability distribution constitutes the natural generalization of the covariance as 
already pointed out. Using properties (iii) and (iv) above, one finds the following 
result, expressed symbolically and without the superscripts for simplicity: 

N N N 

^i^j - ^IC^iiWC^jil [f-variable] + XI [/^-variable] , (37) 

/ I' V't^V 

where the symbol "/i-variable" defines a random variable distributed with a density 
distribution with a power law tail of exponent ^. Expression (|37|) means that the 
tail of the distribution of the product e^ej is dominated by the first term which has 
the smallest exponent fi/2 and thus heaviest tail, and will directly be sensitive to the 



product IGjillGjil for all the independent errors uji. More precisely, an analysis of the 
cumulative distribution of the products eiej will give an asymptotic slope of — ^ in a 
log-log plot and a scale factor Ci associated with ui proportional to |G'ji|^|G'j;|'^. 

We use this set of scale factors associated with the distributions of eje-,- in order 
to define the tail-covariance matrix B G IR^^^ with the following guidelines. It is 
natural that the tail-covariance matrix should contain the information on the tail of 
the products e^ej with distribution (|57D. A bona- fide generalization of the covariance 
matrix requires two additional conditions. It should be sensitive to the sign of the 
dependence between the variables, i.e., if increases (resp. decreases) on average 
conditioned on the increase of ej, the dependence is positive (resp. negative), gen- 
eralizing the existence of positive and negative correlations. In addition, a suitable 
definition of the tail-covariance matrix should be such that it recovers the standard 
covariance matrix for the value /z = 2 corresponding to the special case where the 
stable laws reduce to the Gaussian distribution, as briefly recalled in Appendix 0. 
These considerations lead to the following unique specification for the tail-covariance 
matrix by diagonalization: 

B = GltlCG^ltl . (38) 

where C G IR^^^ is a diagonal matrix associated with and G^a' G H^^^ is the 
corresponding eigenvector matrix. The operator {-j'^^ means that each element of 
matrix or vector is defined by 

Ggl = sign(G,,) (39) 

i.e., the absolute value of each element is raised to the power (3 and then multiplied by 
its sign. This operation can be applied to scalars as well. Without the sign operator 
in (|39|) , the tail-covariance matrix B would be just the matrix of scale factors of 
all the products e^ej. The introduction of the sign function is an essential additional 
ingredient introduced to account for the sign of the dependence between the variables. 



For /i = 2, Gj| = sign(Gij) \Gij\ = Gij, and we check that the tail covariance is 
exactly the same as the error covariance 

B = P = (ee^) . (40) 

This correspondence may appear paradoxical if one interprets the case /i = 2 as 
corresponding to a power law, which has infinite variance. As we already discussed 
for the unidimensioncal case, the technical reason for this comes /^from the form (j^) 
of the characteristic function L^ik) of a distribution with a power law tail, as given 
in PI (Problem 15 of Section 12) or of a symmetric Levy laws. We stress again the 
important point that, for fi strickly less than 2, the inverse Fourier transform of L^lk) 
defined by (§) gives a power law tail, while for /i = 2, it gives a Gaussian law. The 
continuity between the expressions (^) and (|40|) can thus be traced back to that of 
L^{k) as a function of yU at /i = 2. 

The transformation of the scale factors between the mutually dependent errors e 
and independent errors uj as in ( pSj) can be performed in both directions, i.e., not 
only from right- to left-hand side to compute B when G and C are known, but also 
from left- to right-hand side to obtain C (and G) by diagonalization of B. 

B. Solution 

Our aim is to obtain the optimal weight K° in ( p5D for the best estimate x. In 
this goal, we form the set of products eje^ where e = x — x and study their probability 
distribution. As in ( ^7|) and (|38|) , we retain only the term decaying as a power law 
with an exponent /i/2 and get the following tail covariance matrix of x — x for an 
arbitrary weight matrix K°: 

B = (g^ - K°G^) (g^ - K°G^)'^''' + (K°G°)t2l C° (K^G")^^^] . (41) 
Here, we assume that errors and e° are mutually independent. 



In the univariate case, the optimization process is unique and corresponds to 
minimizing C with respect to as in In the multivariate case, however, 

the optimization may be defined in several ways. For example, as recalled above, the 
standard linear least- variance estimation theory attempts to minimize the expectation 
distance between x and x in the mean-square sense, i.e., to minimize trace P given 
by (^OD. For fi < 2 where the covariance P does not exist, we propose to minimize the 
"average" scale factor, i.e., trace B. Such an optimization implies that the uncertainty 
in X is globally the smallest (see Section 2-2 for the univariate case). 

Since the expression ( ^if ) is not smooth in K° due to the presence of the absolute 
values, some care must be taken in the minimization. The non-smooth character of 



(|4l|) makes the differentiation approach to the minimization more cumbersome, as 
one must keep track of the discontinuities. The optimal K° is obtained by solving 

d 



trace B = , 



(42) 



where 
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Taking the derivative with respect to Kf^, we obtain: 
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trace B 
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(44) 



where we have used the fact that d\x\^/dx = /isign(x) ^ = fxx^'^ ^\ with our 
definition (|39D. Note that the same N coefficients K°j^ with m = 1, and only 



them occur in the equations ^^trace B obtained by fixing i and varying j from 
1 to A^. Finding the optimal K° thus involves solving independent systems of 
equations, one for each individual diagonal term Ba, where each system consists of 
the N nonlinear equations g^Bu = for the unknowns K"- at i fixed. 



Optimality of K° is ensured by the positive definiteness of the Hessian matrix for 
I3ii with respect to K°j, for each of the A^ independent systems (each defined as we 
have shown by a fixed i) of A^ equations (obtained by varying j at fixed i). While for 
the univariate case, this amounts to ensure the validity of only one equation (0), in 
the multivariate case we need to study the positive definiteness of the Hessian: 
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(45) 



expressed at the values of K° solving (0). If this matrix is positive definite, then 
minimizes Ba and is therefore optimal. There are two issues associated with this 
formulation for the optimal k (i.e., K°): 1) the possible existence of singular terms 
for fi < 2] and 2) the solvability of the nonlinear system. We address them in the 
appendix ^ and show that the matrix is indeed positive definite and that there is 
only one solution. 



C. Special cases 

1. Case fj, = 2 

For /i = 2, the Levy estimator is the same as the Gaussian one which minimizes 
trace P = trace B as given by (140|). In this case, the system becomes perfectly linear 
with I = yU — 1 = 1 and leads to the following analytical solution for the optimal 
weight given in matrix form 



= (b^ + B°) \ (46) 

which results in the optimal estimates of the state variable and corresponding covari- 
ance matrix 

= B° (b^ + B°) x° + B^ (b^ + B°) x^ , (47) 
B^ = B°(Bf + B°)"^B^ (48) 



2. Independent noise 

When the errors e^'° are independent, i.e., G^'° = I, the trace and diagonal com- 
ponents (^3[) for the average scale factor can be simplified into 

N N 

trace B = ^ ^ [(l - K° ) + (k° C;] , (49) 

i=l p=l 

where we have omitted the absolute values and the sign functions as we look for values 
of the Kalman weights between and 1. Therefore, the problem can be reduced to 
the univariate estimate for each element Xi independently, as defined in equation (|^. 
The equations (|10|,|11]) are thus replaced by their exact equivalent derived from the 



two conditions (^2]) and (^5]). The solution Kf^ for each i is given by ( |T^ ) where 
Kf^ = K° is obtained by setting A = {C°)i^/{Cl)i^ (see (|T^). Due to the hypothesis 
of noise independence, the non-diagonal coefficients K°j are all zero for i j. 



IV. THE KALMAN-LEVY FILTER 
A. Problem 

We are now in a position to construct a sequential data assimilation methodology 
when the noises are distributed according to the power or Levy law. The standard 



assimilation problem is formulated as follows. We consider a linear discrete stochastic 
dynamical system of state variables x G IR^ 

x| = Mfe,,_iX*_i + r7*_i , (50) 

where superscript {•}* denotes the true state and r}l_i is the dynamical noise. The 
index k corresponds to the time sequence when the observations e H^*' are taken 

as 

= H,x* + el , (51) 

where is the observational noise and e M^*^^ is the hnear observation function 
which can vary at each time step k. These observations y^ are assumed to be linear 
functions of the state variable x^ of the system with an additive noise. 

The estimation methodology developed in the previous sections is now extended 
to perform a filtering in order to estimate x* sequentially, by assimilating y^ into 
the deterministic forecast x^. Here, we assume that Mk,k-i and are known. The 
assimilation cycle k is defined over a time interval [A; — 1, /c] between the two adjacent 
observation events. It consists of the following two steps: 

1. a deterministic forecast x^ of x^. from a given initial condition as the best 
estimate of Xj._^ based on the model 

x^, = Mfe,fe_ix^_i . (52) 

The forecast is based on the analysis performed at the previous time step. 

2. This forecast is then used to construct the new analysis x^ which is mixed with 
the assimilated observation. This leads to the probabihstic analysis x| of x^ 
obtained as the weighted average of x^ and y^ under the unbias assumption at 
time k: 

^1 = 4 + (y^ - H,xQ = (I - K,H,) x^, + KkYl ■ (53) 



Accordingly, the errors associated with and are auto-regressive processes. 

xl - X* = Mk,k-i (x^i - xti) - r)l_, , (54) 
x^ - X* = (I - K,H,) (xf - xi) + K,el 

= (I - KfcH,) Mk,k-i (x^i - xti) - (I - KfcH,) r/ti + K^e" (55) 

and the only unknown to be determined is the so-called "gain matrix" needed in 
order to complete the assimilation cycle. 

Our goal is therefore to determine the gain matrix K^, where the superscript L 
refers to the Kalman-Levy filter, which results in the least uncertainty in x^ in each 
assimilation cycle when the noises are distributed according to the power or Levy law 
with the exponent fi. As in (^), we express the sample error vectors as linear sums 
of N independent /x- variables 

rj, ^ GM, e, ^ GWk ■ (56) 

Because (Q) and (|55| ) define linear autoregressive processes with Levy-stable or power 
law probability distribution of the noises, the errors in forecast and analysis are also 
distributed according to the power or Levy laws with the same exponent /i. Without 
loss of generality, they can thus be written as 

f,a ^ /^fia f,a fr^\ 

Xfe - Xfc = Gfc' u,: , (57) 

which defines the matrices G^'^ and the vectors o;^'^ of independent Levy or power law 
processes. Consequently, all error and noise distributions in the assimilation cycles 
are characterized by the corresponding tail covariance matrices 

bI'^=(g1'^)'"'c1'^(g1'^)^'"^ (58) 

Br = (Gr)f"^cr(Gr)^f"^ , (59) 

where C G H^^^ is a diagonal scale-factor matrix associated with ui and G'^^ is the 
corresponding eigenvector matrix. Given a B, the corresponding G and C can be 
obtained by diagonalization. 



B. Solution 



The optimal Kalman-Levy (KL) filter which minimizes the global average 
error is obtained by minimizing the trace of the tail-covariance matrix B| of the 
resulting probabilistic analysis of x^. Since the expression for is not smooth 
in due to the presence of the absolute values, some care must be taken in the 
minimization as already discussed. The non-smooth character of B^ makes the dif- 
ferentiation approach to the minimization more cumbersome, as one must keep track 
of the discontinuities. However, we use the approach described in Appendix B to 
check if the two conditions are simultaneously verified: 

d 



traceB^ = , (60) 



and positive-definiteness of the Hessian matrix 



traceB^ . (61) 



We solve the first condition and then examine the second condition. This is achieved 
by taking the following two steps in each assimilation cycle. 

Step 1. Dynamic forecast: 

Given a set of initial conditions described by the subscript {■}k-i which 
are known, the forecast is performed deterministically to advance from 
k-lto k based on (1521) and (M) 



xi = Mk,k-i^U (62) 

leading to the tail-covariance of the forecasts at time k: 

Bl = (M,,,_iGtO (M,,,_iGty + BU , (63) 



where the definition (^,^) is used. 



2. Probabilistic analysis: 

Given the forecast with from Step 1 along with the observations 
with tail-covariance B|,, the analysis provides the optimal estimate 



= x^, + Ki (yl - H,x 



(64) 



with tail-covariance 



f L fAtflf/f L 

Gfe - K^.HfcGfcj ^ C;, [Gf^ - K^HfcG 



(65) 



where the definition (|38|J39| ) is used. By letting subscripts represent the 
matrix elements and dropping the time index k for simplicity, we solve for 
the optimal filter K"^ so that it satisfies the conditions (pDj). The diagonal 
elements of can be explicated as follows: 
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(67) 



Because B^^ does not depend on Kqj for g 7^ the first condition for the 
optimal K[ 
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using the signed power operator as in the case of (^41) . 



For the optimal filter so obtained, positive definitiveness of the Hessian 
matrix 
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is satisfied for yU > 1, following the same approach as discussed in Section 
3-2 and in Appendix B. 

The optimal estimate and tail-covariance obtained by substituting 
into (1^) and (|65| ) become a set of initial conditions for the next 



assimilation cycle k + 1. 

Similarly to the case of the multivariate estimator which is solution of (|4^ ) 
discussed in Section [III B| , the first condition for the optimal KL filter ( |68|) for a 
fixed i leads to a set of self-contained nonlinear equations for N unknowns for 
j = 1, . . . ,N. Minimization of the average scale factor is therefore equivalent to the 
minimization of each tail-covariance element B'^^ for — x\ with respect to the ele- 
ments of Kfc with at least one of the indexes equal to i. Such a set of solutions of 
the nonlinear equations for an arbitrary fi may not be available analytically but can 
be obtained numerically. For /i = 2, the KL filter is reduced to the same formula as 
the conventional Kalman-Gaussian (KG) filter, i.e., the forecast error tail-covariance 
(1631) is 



Bi = M,,,_iB^,Mj,_, + BU 



(70) 



The analysis error tail-covariance (|6^) is 



I - K^H,) Bi , 



(71) 



where the optimal KG gain that satisfies (|60D is given by 



-1 



(72) 



The analysis state variable is also obtained by substituting ( [72D into (Q). In 



this case, sequential data assimilation does not require any diagonalization of the tail 
covariance matrix. 



To understand the fundamental properties of the KL filter, we study the univariate 
problem with = 1 and L = 1 in detail for the case where the exponent fi is larger 
than 1. The case /x < 1 has been discussed in section 2 and leads to a Kalman weight 
equal to either or 1. 

In this case, the tail-covariances B^'^'^ correspond to the scale factors C^'*^'* directly 
and we have the following data assimilation cycle. 

Step 1. Dynamic forecast: 



C. Univariate Kalman-Levy filter 



1. Solution 



x{ = Mk,k-ixl_i 



(73) 



with tail-covariance matrix 



Bi = \M,,k-irB^,_, + Bl 



k-1 



(74) 



as derived 




calculation rules given above in points 




Step 2. Probabilistic analysis: 



xl = {l-KkHk)x{ + K,yl, 



(75) 



leading to the following scale factor 



Bl=\l-K,H,\^Bi + \Kk\^Bl 



(76) 



Its minimization with respect to leads to 
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after substituting the optimal KL gain 
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with the modified relative error ratio 
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Notice that expression (|78D is nothing but rewriting (|I3|). 



(77) 
(78) 



(79) 



(80) 



2. Properties of the solution 



Because the KL filter is designed to minimize B^, we gain insight into its perfor- 
mance by investigating B^ along with Bl in each assimilation cycle. While the state 
variable x^'^ has a stochastic dynamics through the observation y", their scale factors 
B^fj'^ are completely deterministic. Using ( [78| ) and ([7^), the evolution of B^j^^ can be 
expressed as uncoupled one-dimensional maps: 

\Mk,k-i\ti\^ 
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(82) 



where in can be given in terms of B^_^ for ( ^21) by substitution of ([7^) into 



83) 



Two limiting cases can be analyzed. First, in the limit where the main origin of 
variability in the factor F multiplying Bl_^ in the r.h.s. of (|8T|) comes from either 
Mk^k-i, Bl or Hk and not from Bl, this factor F can be considered to be approx- 
imately independent of Bl_i. The expression (|8T|) becomes a multiplicative noisy 
auto- regressive equation which has been much studied in the literature 
The most important result is that Bl remains finite at all times if the expectation 
of the logarithm of the factor F multiplying Bl_^ in (81) is negative. This condition 
ensures that Bl does not grow exponentially at large times. Usually, in this regime, 
if the factor F exhibits intermittent excursions to values larger than one, it can be 
shown that the scale factors Bl_^ themselves will be distributed according to a power 
law distribution. A similar result holds for B^, whose upper limit is bounded by Bl. 

The second interesting case occurs when the system is stationary, i.e., M = 
Mk^k-i, H = Hk, and B^''^ = 5^'*^ for all k. By defining nondimensional tail- 
covariances (which are replaced by scale factors in this single variable case) normalized 
by the dynamical error's scale factor B^, 
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the corresponding dynamical maps (0) and (|82D are reduced into 
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For a given fi, the two parameters controlling the evolution of 6^'^ are the dynamical 
coefficient M and the ratio of the characteristic error sizes of the dynamics over the 



observation defined by 



A" 



H(Bv) 



(86) 



The corresponding KL gain is 



H~ 



1 + 



^7) 



which can be expressed in terms of h^_^ as well. The KL filter parameter (|86[) and 

I), and the resulting KL analysis (|85| ) 



gain (^) are the counterpart of ([TsD and 
takes the form similar to (14) of the Levy estimator in Section II B 



For any values of h^]l\, the scale factors fo^'*^ at the next time step are bounded: 
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indicating that the univariate sequential estimation system cannot diverge as long as 
the exponent /i is finite. 

The evolution of the scale factors is demonstrated in Figure The maps are 
represented by the graphs of h\ and h% as functions of h%_^ for four values of the expo- 
nent = L2, 1.5, 2 and 3 with the set of parameters (A^, M) = (1, 0.9), corresponding 
to a contracting map. Starting from on the diagonal line, the dynamic forecast 
takes bl_^ to 6[ on the corresponding /i-curve (upper group), which is followed by 
the probabilistic analysis down to bl on the corresponding /i-curve (lower group) to 
complete one data assimilation cycle, bl_^ ^ b\ ^ b%. A new analysis b^ is moved 
horizontally onto the diagonal line to become an initial scale factor value for the next 
cycle. 

On each 6^ curve for a fixed yU, the symbols (circle, square, diamond and triangle 
for /X = 1.2, 1.5,2 and 3, respectively) at the intersection with the diagonal line, i.e. 



b'^ = bl = bl_i, are the stable solutions of the KL filter. The maps (|8^) and (|85| ) 
in fact have each a stable fixed-point solution, ¥ and b^, that attracts any initial 
condition for any exponent fi, given a set of parameters (A'', M). For /i = 2 retrieving 
the case of the KG filter, this stable fixed-point can be obtained analytically [p!0| . 

The error probability distributions for the stable fixed-points corresponding to 
the Student's distribution (^) are shown in Figure ^j-e based on the scale factor 
gf.a^ We use {x^,y°) = (0, 1) and H = 1, so that x"" = K^. For small /x, the KL 
filter favors more strongly the better sample characterized by the smaller scale factor 
between the two (F for and (A^)^ for y"). This effect is stronger when /i decreases 
to 1, as discussed in section 2. The value of the fixed point b^ is larger for smaller /i, 
i.e., a system with a probability distribution with heavier tail has greater uncertainty, 
not only because of its slow decay measured by the exponent /i but also due to its 
overall amplitude quantified by the scale factor. Furthermore, the slope of the curve 
bl as a function of bl_i shown in Figure ^ is closer to the horizontal for smaller /i, 
indicating that the convergence to the stable fixed-point is faster for the heavy-tail 
probability distribution. This is because the KL filter with smaller fi tends to favor 
either forecast or observation strongly, depending on their relative noise amplitude 
quantified by the scale factor of their noises, as discussed in Section [11 C 



Since the stationary KL assimilation system quickly approaches a unique steady 
state for a given set of parameters (A^, M) and exponent fi, the stable fixed-point 
defined by ¥ and b^ along with the corresponding optimal Kalman-Levy gain 
suffice to provide a complete description of the stationary assimilation process. Figure 
H shows the stable solution obtained from (|8^ , (^) and (|87D , which are plotted in 
the parameter space (A^, M) for fi = 1.2, 1.5 and 2. All tail covariances (scale factors) 
are plotted in terms of {b)'i^ so as to preserve the characteristic scale as in (|]) of the 
error, independently of the value of fi. 

For convergent dynamics M < 1 with sufficiently large relative observational error 



> 1, ^ 1 and hence favors the forecast (Figure f and i) and hence 

resuhs in ~ U . For divergent dynamics with a larger value of M > 1, however, 
it yields a large value for the forecast's scale factor U (Figure |^a, d and g) with 
respect to relative observational error A**. Here, the KL filter correctly derives that 
the errors are amplified by the unstable evolution of the system. Since Ix is large and 
hence A^ ~ 0, ^ 1 favors the observation over the forecast and hence results in 
~ (A^)'^ (Figure Qd, e and h). This effect derived by the KL filter is more significant 
for heavy-tail probability distributions with a smaller /i, and it is manifested in the 
much steeper gradient of for /i = 1.2 (Figure ^) in contrast to the widely spread 
contour maps observed for = 2 (Figure ^i). 

Note that for M = 0, we retrieve the univariate filter studied in section 2. In 
particular, for A'' = 1, = 1/2, corresponding to equal weights of the assimilation 
on observation and forecast for any exponent /i. The curvature to the right taken by 
the contour maps of the Kalman gain i^T as a function of M can easily be rationalized: 
a larger M implies a larger forecast error, hence a smaller effective A^. One thus needs 
a large observation over dynamical error ratio to get the same effective effect, hence 
the downward convexity of the contour maps. 

3. Relative performance of the KL and KG filters 

We now examine the case where the KG filter (corresponding to putting 
/X = 2 in the solutions) is applied to the system whose true noise distribution is 
the heavy-tail power law (/i < 2). This may happen in a practical implementation 
of Kalman filtering when we do not know the nature of the noises very well and 
a finite variance assumption is made. This is also probably the only choice left to 
the operator in absence of our solution presented in this paper. This exercise is thus 
aimed at quantifying what we have gained concretely by recognizing the non-Gaussian 



nature of the noise and by providing the corresponding solution. We stress that what 
matters according to the KG framework is whether the noise has a finite variance or 
not. In other words, all non-Gaussian noises with finite variance are treated in the 
same fashion within the KG approach by analyzing the variance only. In contrast, 
the KL solution distinguishes noises even if they have a finite variance by analyzing 
the structure of their 'fat-tail" characterized by the exponent /x. For instance, the 
KL gain is different for noise distributions with power law tails with fi = 3 and with 
/X = 4, while in contrast the KG solution is the same for both if they have the same 
variance. 

We formulate this scenario in a general form when an incorrect model exponent 
/i is used to assimilate the data from the system with true exponent fi. In case of 
the KG filter application, this means ft = 2. The model for data assimilation that we 
obtain is equivalent to ([73|)- (|80|) by replacing 



where {■} represents the model filtering. The exponent factor {-jA arises so as to 
preserve the characteristic scale of the noises. 

Use of the model gain Kk 7^ Kj^ due to an incorrect model exponent jl yields 
a non-optimal filtering by definition, in a sense that the analysis scale factor 
is not a minimum. In addition, such a model filtering estimates both forecast and 
analysis tail covariances 5^'^ incorrectly as B^^^, because the real evolution of the tail 
covariances using the non-optimal model gain Kk should follow the non-optimal KL 
filtering scheme which itself uses the true exponent /x 




(90) 



(91) 



Bi = \Mk,k-ir BU + Bl 



(92) 




(93) 



Accordingly, there are three filtering representations, i.e.. 



(i) true and optimal KL filtering S^'*^ using (/x, K^)] 

(ii) true but non-optimal filtering B^jf using (/x, K'^)] 

(iii) model and incorrect filtering B]!^ using [Jx^K^). 

By definition, the optimal filtering (i) is always superior to the non-optimal filtering 
(ii), (-Bfc)'' < (-Bfc)^- It is possible, however, that the incorrect model filtering (iii) 
returns a value for the scale factor which is numerically smaller (see table 1). Since 
the model exponent jl is different from the true exponent /i, the scale factors cannot 
be compared directly to infer the quality of the assimilation process. 

When the system is time-independent, the normalized one- dimensional maps of 
the tail covariances using the non-optimal KL filter with model gain K are 

b{ = |M(1 - KH)\^' b{^^ + \MKHXY + 1 , (94) 
hi = |M(1 - KH) l'^ bl_^ + 1 1 - rai'^ + {KHXY . (95) 

This non-optimal filtering also has a unique stable fixed-point 

b = — , 96 

1 - \M{1-KH)\^' 

b = ^ U — , 97 

1-\M{1 - KH)\'' ^ ' 

provided the condition for stability is satisfied 

< |M(1-M)| < 1 . (98) 

To see this effect of non-optimal filtering for the KG filter application, we apply the 
model gain K = with fi = 2 (Figure P) to a time independent system (p4D-(p^) 
subjected to the Levy noise with true exponent fi = 1.2. The stable assimilation cycle 
for this optimal KL filtering have been presented in Figure ^-c. The unique stable 
fixed-points of the non-optimal filter given by ( p6D and (0) are shown in Figures 



^t,a, 1 

^ and b, in terms of the characteristic scale (6 Because the KG filtering is no 
longer optimal, b are now larger than the corresponding optimal scale factors 6 ■'^ 
of the KL fixed-point (Figure ^ and b). 

To quantify the difference between the KL and KG solutions, we construct the 
differences of the normalized stable fixed-point found in the three assimilation repre- 
sentations (i)-(iii). In Figure ^c-f, we present the comparison for the following two 
cases: 

1. difference between the non-optimal filtering (ii: as in Figure |^a and b) and 
optimal KL filtering (i: as in Figure ^ and b); 

2. difference between non-optimal filtering (ii) and incorrect model filtering (iii). 

All results are shown in terms of the characteristic error scales, 6^ or 6 so that the 
comparison can be made independently of the exponents /i and /2 in the filters. 

The first comparison between (ii) and (i) corresponds to the difference between the 
optimal and non-optimal filtering. In Figure ^ and d, we observe a bimodal structure 
in the difference (6 '^)m — (6 '^)m, caused by the maximum-minimum structure in the 
gain K'-' — (Figure ^e), whose origin is the following. For M < 1 for which 

< for /i < 2, > . The non-optimal gain K'^ obtained from the model 
KG solution thus overestimates the uncertainty of the forecast. On the other hand, 
for M > 1 for which > for /i < 2, K'^ < . The non-optimal gain 
now underestimates the reliability of the observation. Both ways, it increases the 
non-optimal solution in comparison to the optimal solution h^. 

The second comparison between (ii) and (iii) relates to the actual mistake in the 
model solution If'^ made by using the model gain K'~' to the system which reaches a 
different, non-optimal stable fixed-point 6^'^^. As shown in ^ and g, (6^''^)'' — (6^'^^)^ 
are positive and therefore the model filtering incorrectly underestimates the error. 



Figure is the KG filtering application when the true exponent fi = 1.5 is not 
as heavy as the previous case fi = 1.2. Although the KL solution is better than the 
KG one as expected, the difference is smaller: the improvement is of the order of 
5 — 10% at most. The fact that the improvement has a smaller amplitude is clear: 
a larger exponent implies a thinner tail and thus a behavior closer to the Gaussian 
case. Recall that at fi goes to 2, the Gaussian case and solution are recovered. 



4- Numerical experiment 

To check these results derived from the analysis of the deterministic behavior 
of the tail covariances (scale factors) 6^'*^, we present a numerical experiment of the 
stochastic dynamics, observation construction and assimilation processes. We use 
the parameter set (A**, M) = (1, 0.9) with noises distributed according to a Levy law 
with exponent fi = 1.2. The one-dimensional map and probability distribution of the 
stable fixed-point for this parameter set are shown in Figure ^ja and b. To generate 
the Levy noises, we follow the standard algorithm described initially in |^ and use 



the software available at |T8[. The stochastic variables x\ and y'^ are generated over 
10000 time steps. 

Typical results of the KL filtering (yU, Kj^) are shown in Figure ^ The true evolu- 
tion is shown over the 10000 time steps in Figure |^. Note the occurrence of a few 
very large fluctuations that dwarf most of the remaining dynamics. To get a closer 
view, we enlarge figure |]a in the narrow time interval [1000, 1050] shown in figures 
|b-c. Fi gure §b gives the dynamical evolution of the true, forecasted, observation and 
analysis variables, when using the optimal KL filtering, while figure ^c corresponds 
to the use of the non-optimal filtering. Note that the two filtering's use the same 
gain K'^ and therefore result in the same x^'^. One can observe on Figure that 
the optimal KL filtering follows rather closely the observation y^. This results 



from the high value of . In constrast, the non-optimal filtering puts x% midway 
between and x^. The tail covariances (scale factor) quickly approaches the stable 
fixed-point after a few iteration as given in Table |I|, along with the stable fixed-points 
of the non-optimal filtering 6^'^ with (/i, K^) and model filtering with (/i, K^). 
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TABLES 





(i) optimal KL (b) 


(ii) non-optimal (6) 


(iii) model (b) 












1.87 


2.09 


1.18 




0.99 


1.24 


0.59 


K 


0.96 


0.60 



TABLE I. Stable fixed-points of (i) optimal KL, (ii) non-optimal KG and (iii) incorrect 
KG, using the exponents {n, jl) = (1-2,2) and system parameter set (A'',M) = (1,0.9). 



Because the KL filter is designed for the global control of the uncertainty by 
minimizing the tail covariance, we propose to compare the tails of the distribution of 
errors {x'^ — x*) resulting from the two methods (KL and KG) to assess their relative 
performance. Note that, since the covariance does not exist for /i < 2, it cannot be 
used to evaluate the performance of the heavy-tail KL filtering. 

Figure |^ shows the (complementary) cumulative distribution of (x^ — x*) and 
{x^ — x^), as well as that of {y° / H) — x^ for reference. For this parameter set (A*, M) = 
(1, 0.9), the two optimal KL and model KG gains at their stable fixed-points differ by 
37.5% (Table H). This shows that the model KG filter underestimates the reliability 
of observation and overestimates the value of the forecast. 

Although the difference in the cumulative distributions is rather subtle to deter- 
mine from visual inspection of Figure the cumulative distribution of the error 
between the analysis and the true trajectory obtained from the optimal KL filter 
is consistently below that obtained by using the non-optimal filter, apart from ex- 
pected fluctuations. In probability terms, the optimal KL error distribution exhibit 
the property of being "stochastically dominant" over the model KG error distribu- 
tion. This shows that the optimal KL filter is indeed superior to the model KG filter 
in the presence of heavy tails. 

In fact, our theory predict the difference of 37.5% (Table |) based on the stable 
fixed-point presented in Section p.V G 2| . It is confirmed by the synthetic simulation 
of the Levy random variables with /i = L2 as presented in Figure ^b based on the 
corresponding scale factors of the stable fixed-point W'^ and b^''"^ (Table |). 

The superficial visual effect in Figure ^a can be explained as follows. In the tails. 
Levy laws with exponent /i are power law given by C/ {x^ — x^)^ where C is the scale 
factor of the errors. If C is higher by 37.5% for the non-optimal filtering compared 
to the optimal KL filtering (Table |I[), this represents a significant error reduction. 
However, this will not be strikingly visible in the log-log representation of figure ^, 



since In 1.375 0.32 leads to a translation of the two cumulative distributions by 
only 0.32, hence the small but still visible effect. 

Another more compact way of quantifying the relative performance of the two 
solutions is to calculate a typical error amplitude, which generalizes the covariance. 
Since we have considered the situation where jj, > 1, the average of the absolute 
value {\x^ — x^\) of the errors corresponds to a moment of order 1, which is defined 
mathematically and is numerically well-behaved. Our direct numerical simulations 
show a decrease of the typical error amplitude {{\x^ — x^\)) by approximately 20% 
when going from the non-optimal solution (ps 3.3) to the optimal KL filtering (ps 2.8). 

V. CONCLUSION 

We have presented the solution of the Kalman filter problem for dynamical and 
forecast noises distributed according to power laws and Levy laws. The main theo- 
retical concept that we have used is to optimize the Kalman filter to chisel the tail of 
the distribution of residual errors so as to minimize it globally. In order to implement 
this program, we have introduced the concept of a "tail covariance" that generalizes 
the usual notion of the covariance. The full solution, called the Kalman-Levy filter, 
is obtained by the solution of a general non-linear equation. We have investigated in 
detail the quality of this solution in the univariate case and have shown by direct nu- 
merical experiments that the improvement is significant, all the more so, the heavier 
the tail, i.e. the smaller the power law exponent /x. 
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A. STABLE LEVY LAWS 



The stable laws have been studied and classified by Paul Levy, who discovered 
that, in addition to the Gaussian law, there is a large number of other pdf 's sharing 
the stability condition 

PN{x')dx' = Pi{x)dx where x' — a^x + , (99) 

for some constants gn and b^, where x' is the sum of N independent variables of 
the type x distributed according to the pdf Pi{x). One of their most interesting 
properties is their asymptotic power law behavior. 

A symmetric Levy law centered on zero is completely characterized by two pa- 
rameters which can be extracted solely from its asymptotic dependence 

C 

P{x) ~ for X ±00 . (100) 

C is a positive constant called the tail or scale parameter and the exponent ^ is 
between and 2 (0 < < 2). Clearly, > for the pdf to be normalizable. As for 
the other condition // < 2, a pdf with a power law tail with > 2 has a finite variance 
and thus converges (slowly) in probability to the Gaussian law. It is therefore not 
stable. Only its shrinking tail for // > 2 remains of the power law form. In contrast, 
the whole Levy pdf remains stable for < 2. 

All symmetric Levy law with the same exponent /j, can be obtained from the Levy 
law Ln{x) with the same exponent /i, centered on zero and with unit scale parameter 
C — 1, under translation and rescaling transformations 

P{x)dx = L^{x')dx' where x' = C^^^x + m , (101) 

m being the center parameter. 

Levy laws can be asymmetric and the parameter quantifying this asymmetry is 

/3=(C+-C_)/(C+ + C_) , (102) 



where C± are the scale parameters for the asymptotic behavior of the Levy law for 
X — > ±00. When /3 7^ 0, one defines a unique scale parameter C = (C+ + C_)/2, which 
together with /3 allows one to describe the behavior at x —* ±00. The completely 
antisymmetric case f3 = +1 (resp. —1) corresponds to the maximum asymmetry. 

For < /i < 1 and (3 = ±1, the random variables take only positive (resp. 
negative) values. 

For 1 < /i < 2 and f3 = +1, the Levy law is a power law for x +00 but goes to 
zero for x —00 as P{x) ~ exp(— . This decay is faster than the Gaussian 
law. The symmetric situation is found for (3 = —1. 

An important consequence of ( 100 ) is that the variance of a Levy law is infinite 



as the pdf does not decay sufficiently rapidly at |x| — > 00. When /i < 1, the Levy 
law decays so slowly that even the mean and the average of the absolute value of 
the spread diverge. The median and the most probable value still exist and coincide, 
for symmetric pdf (/? = 0), with the center m. The characteristic scales of the 
fluctuations are determined by the scale parameter C, i.e. they are of the order of 

There are no simple analytic expression of the symmetric Levy stable laws L^(x), 
except for a few special cases. The best known is /i = 1, called the Cauchy (or 
Lorentz) law, 

Li(x) = — for — 00 < X < +00 . (103) 

X^ + 7!"^ 



The Levy law for fi = 1/2 is pO | 



2 exp ( — 77- 

Li/2(x) = for a; > . (104) 



/vr (2x)t 

This pdf Li/2{x) gives the distribution of first returns to origin of an unbiased random 
walk. 

Levy laws are fully characterized by the expression of their characteristic func- 
tions : 



Lf,{k) = exp {-a^\k\^ 



(105) 



where an is a constant proportional to the scale parameter C : 



71 C 



for 1 < n <2. 



(106) 



" fi'^Tifi - 1) sin {^Y] 
A similar expression holds for < /i < 1, while n = 1 and 2 requires a special form 



(see [0 for full details). For /5 7^ 0, we have 



La{k)= exp 



-a^|/i;|^' 1 + i/3tan(/i7r/2) 



1^1, 



for yU 7^ 1 . 



(107) 



For /i = 1, tan(/i7r/2) is replaced by (2/7r) In 



B. PROOF OF THE OPTIMALITY OF THE SOLUTION OF (44) 



To prove the optimality of the solution K° solving p^), it is sufficient to consider 
only one of the system of equations for a single and fixed index i. We thus drop 
the index i and define the matrix f2^'° in IR^^^ and the vector k in IR^ as follows: 



pq 



pg 







^ip 2^rn=l im rap p P Q 

for p 7^ g 



11-2 



0, 



for p 7^ g 



'^q — ^iq 



(108) 



Expression (|^) can then be written as 



(109) 



It is clear in this form that the Hessian matrix is positive definite for /i > 1 because 
the linear sum of symmetric positive definite matrices results in a positive definite 
matrix. 

There are two issues associated with this formulation for the optimal k (i.e., K°): 
1) the possible existence of singular terms for /i < 2; and 2) the solvability of the 
nonlinear system. To address these issues, we rewrite (^if ) as 

d 



dK, 



trace Ba = f(K;, /x) = 0, 



(110) 



and seek for a solution branch K,{fi) of ( 110 ) using implicit function theory as fi varies. 

At /i = 2, the system is linear in k and a unique solution K,{fi = 2) can be obtained 
analytically (this is nothing but the standard linear least- variance estimation). For 
1 < /i < 2, f(K;,/i) is bounded. Its derivative with respect to k, («;,//), is given 



by (|109|) which is always positive definite. It can be singular and diverge to +oo due 
to absolute value terms behaving like lim^j^o la;]'*"^ = oo, as can be seen in equation 



108| ). The derivative of with respect to /i is: 



d 92 ^ 



N 

p=i 



log I ( ^ip ~ X! -^fm^Lp 1 I ) ( '^ip ^ ^im^ 
\ m=l / \ m=l 

+ (log I ^L'^mp) 1) K^mG'rr 

^ \m=l / ^ \m=\ 



'IIV. 



This derivative can also be singular due to the absolute value terms behaving 
like lim^^^oa^''^"^^ log |x| = ±00. Note that ^f(K;,/i) and ^f(K, /x) become sin- 
gular simultaneously, but the former is more singular than the latter because 
limj,.^o \x\^~'^ / x^^~^^ log I a; I = ±00. Implicit function theory can therefore be applied 
to guarantee that a unique solution branch ^(/i) exists for 1 < fi < 2, starting from 
the analytical solution at fi = 2. Indeed, if there exist other solution branches, then 
there must be at least one bifuration as fi varies because the solution at /i = 2 is glob- 
ally unique due to linearity. The fact that is a positive definite matrix globally 
(though it can be singular) however guarantees that there is no bifuration. Accord- 
ingly, the solution branch from fi = 2 provides the unique solution of the system as 
/X varies. 

The solution on the branch can be obtained numerically, either by directly solving 
for the nonlinear equations for each /i or by following the branch using the pseudo 



arc-length continuation method [|15 
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FIG. 1. Dependence of the weight K° given by ( [T^ ) to the second measurement as a 
function of the relative amphtude A (eq.(^) of the noise of the two measurements: a small 
(resp. large) value of A corresponds to a small (resp. large) error on the second measurement 
relative to the first one. The different curves correspond to different tail exponents fi. 



Variance 
1 .5 



Power law scheme 
in the limit JI — +00 



Power law scheme for 
H=3 




FIG. 2. Dependence of the variance Varl" and Var^ of the total error obtained using 
respectively the Levy and the Gaussian weights, as a function of A = ^ equal to the ratio 
of the typical widths of the Student's distributions for the two measurements. 
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FIG. 3. Dependence of the scale factors and of the total error obtained using 



respectively K° and weights, as a function of A. 
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a) Map (A,^M)=(1,0.9) 




d) n=2 (K'=0.60) e) |x=3 (K'=0.52) 




FIG. 4. a) Graphs of bl and bl as function of for (A^M) = (1,0.9). For both 
6^ and b^, the hnes represent the corresponding maps and the symbols are the stable 
fixed-points; sohd, dash, dash-dot, and dash-dot-dot lines, as well as circle, square, dia- 
mond, and triangle symbols correspond to /i = 1.2, 1.5, 2 and 3, respectively; b) error 
probability distribution for the stable fixed-point corresponding to fi = 1.2 and parameters 
{x^,y°) = (0, 1) and H = 1, so that x'^ = K^; c, d and e) same as b but corresponding to 
12 = 1.5, 2, and 3, respectively. 
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FIG. 5. Stable fixed-point of tlie Kalman-Levy filter as a function of and M : 



(6 ) , b) (5^) and c) K for ^ = 1.2; d), e) and f) same as a), b) and c) but for for /U = 1 
g), h) and i) same as a), b) and c) but for for fi = 2. 



FIG. 6. Stable fixed-point of the Kalman-Gaussian filter applied to a heavy-tail system 

with fi = 1.2: a) (6 ) , b) (6^) , c) difference (6 ) — (6 ) between the non-optimal filtering 

~f i -f i i-i 

solution [b shown in a) and the optimal KL filtering solution (6 shown in Figure 

d) difference (6^)'^ — (6^)'^ between the non-optimal filtering solution (b^)'^ shown in b) 

and the optimal KL filtering solution (b^)'i^ shown in Figure ^3, e) difference K'-' — 

between the non-optimal filtering solution and the model filtering solution, f) difference 

— (b^)'^ between the non-optimal filtering solution (6^)'' shown in b) and the model 

filtering solution {b , g) difference (6^) — (b^) a between the non-optimal filtering solution 

" i ~ i 

(6^) shown in b) and the mode filtering solution (6^) a . 
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FIG. 8. Numerical experiment for the parameter set (A**, M) = (1, 0.9) with n = 1.2; a) 
evolution of true state variable x\. for k = 1, . . . , 10000. Note the occurrence of a few large 
peaks corresponding to rare but extreme noise fluctuations distributed with the Levy distri- 
bution. Panels b) and c) show a magnification of panel a) in the time interval [1000, 1050] 
and compare this true dynamics (continuous line) with the forecasts x-^ (squares), the ob- 
servations y° (circles) and the assimilation analysis (diamonds). Panel b) corresponds 
to the use of the optimal KL filtering while panel c) corresponds to the the model filtering 
with /i = 2, i.e. standard Kalman-Gaussian filter. It appears clear by visual inspection 
that the optimal KL analysis is much closer more often than not to the true dynamics. 
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FIG. 9. a) (Complementary) cumulative distribution (i.e. number of time steps where 



the error is larger than a value read on the abscissa) of the error between the analysis and 
the true trajectory for (A^,M) = (1,0.9) and /x = 1.2, obtained by using the optimal KL 
filter {x^ — x^; in thick solid line) and model KG filter K}^ (x^ — x*; in thick dashed line). 
We observe clearly that the distribution of — is below that of x'^ — x*, i.e. the errors 
are globally reduced in distribution by application of the Kalman-Levy method compared 
to the standard Kalman-Gaussian method. We also show the cumulative distributions of 
the difference between forecast and true trajectory for the optimal KL filter (x^ — x*; 
in solid line) and model KG filter K]^ (x^ — x*; in dashed line), as well as the cumulative 
distributions of the observations {y° / H — x*; in dotted line, which is almost identical to 
x^ — X* and thus hardly visible due to the thickness of the lines), b) (Complementary) 
cumulative distribution of the synthetic simulation of random Levy variables based on the 
scale-factors of the stable fixed-points h and h (Table 0) predicted by the theory. The 
cumulative distributions of the same observations {y°/H — x^; in dotted line) as in a) is also 
shown for reference. This alternative method for constructive the distributions of errors 
shows the full consistency of the approach. 



